Search CORE

3,124 research outputs found

Cross-lingual transfer learning and multitask learning for capturing multiword expressions

Author: Ha Le An
Rohanian Omid
Taslimipoor Shiva
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2019
Field of study

This is an accepted manuscript of an article published by Association for Computational Linguistics in Proceedings of the Joint Workshop on Multiword Expressions and WordNet (MWE-WN 2019), available online: https://www.aclweb.org/anthology/W19-5119 The accepted version of the publication may differ from the final published version.Recent developments in deep learning have prompted a surge of interest in the application of multitask and transfer learning to NLP problems. In this study, we explore for the first time, the application of transfer learning (TRL) and multitask learning (MTL) to the identification of Multiword Expressions (MWEs). For MTL, we exploit the shared syntactic information between MWE and dependency parsing models to jointly train a single model on both tasks. We specifically predict two types of labels: MWE and dependency parse. Our neural MTL architecture utilises the supervision of dependency parsing in lower layers and predicts MWE tags in upper layers. In the TRL scenario, we overcome the scarcity of data by learning a model on a larger MWE dataset and transferring the knowledge to a resource-poor setting in another language. In both scenarios, the resulting models achieved higher performance compared to standard neural approaches

Crossref

Wolverhampton Intellectual Repository and E-theses

Advances in automatic terminology processing: methodology and applications in focus

Author: Ha Le An
Publication venue: University of Wolverhampton
Publication date: 13/07/2023
Field of study

A thesis submitted in partial fulfilment of the requirements of the University of Wolverhampton for the degree of Doctor of Philosophy.The information and knowledge era, in which we are living, creates challenges in many fields, and terminology is not an exception. The challenges include an exponential growth in the number of specialised documents that are available, in which terms are presented, and the number of newly introduced concepts and terms, which are already beyond our (manual) capacity. A promising solution to this ‘information overload’ would be to employ automatic or semi-automatic procedures to enable individuals and/or small groups to efficiently build high quality terminologies from their own resources which closely reflect their individual objectives and viewpoints. Automatic terminology processing (ATP) techniques have already proved to be quite reliable, and can save human time in terminology processing. However, they are not without weaknesses, one of which is that these techniques often consider terms to be independent lexical units satisfying some criteria, when terms are, in fact, integral parts of a coherent system (a terminology). This observation is supported by the discussion of the notion of terms and terminology and the review of existing approaches in ATP presented in this thesis. In order to overcome the aforementioned weakness, we propose a novel methodology in ATP which is able to extract a terminology as a whole. The proposed methodology is based on knowledge patterns automatically extracted from glossaries, which we considered to be valuable, but overlooked resources. These automatically identified knowledge patterns are used to extract terms, their relations and descriptions from corpora. The extracted information can facilitate the construction of a terminology as a coherent system. The study also aims to discuss applications of ATP, and describes an experiment in which ATP is integrated into a new NLP application: multiplechoice test item generation. The successful integration of the system shows that ATP is a viable technology, and should be exploited more by other NLP applications

Wolverhampton Intellectual Repository and E-theses

Mutual terminology extraction using a statistical framework

Author: Ha Le An
Mitkov Ruslan
Pastor Gloria Corpas
Publication venue: Sociedad Española para el Procesamiento del Lenguaje Natural (SEPLN)
Publication date: 16/06/2008
Field of study

In this paper, we explore a statistical framework for mutual bilingual terminology extraction. We propose three probabilistic models to assess the proposition that automatic alignment can play an active role in bilingual terminology extraction and translate it into mutual bilingual terminology extraction. The results indicate that such models are valid and can show that mutual bilingual terminology extraction is indeed a viable approach

Wolverhampton Intellectual Repository and E-theses

Cognitive processing of multiword expressions in native and non-native speakers of English: evidence from gaze data

Author: Ha Le An
Rohanian Omid
Taslimipoor Shiva
Yaneva Victoria
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

Gaze data has been used to investigate the cognitive processing of certain types of formulaic language such as idioms and binominal phrases, however, very little is known about the online cognitive processing of multiword expressions. In this paper we use gaze features to compare the processing of verb - particle and verb - noun multiword expressions to control phrases of the same part-of-speech pattern. We also compare the gaze data for certain components of these expressions and the control phrases in order to find out whether these components are processed differently from the whole units. We provide results for both native and non-native speakers of English and we analyse the importance of the various gaze features for the purpose of this study. We discuss our findings in light of the E-Z model of reading

Wolverhampton Intellectual Repository and E-theses

Using gaze data to predict multiword expressions

Author: Ha Le An
Rohanian Omid
Taslimipoor Shiva
Yaneva Victoria
Publication venue: INCOMA Ltd
Publication date: 01/09/2017
Field of study

In recent years gaze data has been increasingly used to improve and evaluate NLP models due to the fact that it carries information about the cognitive processing of linguistic phenomena. In this paper we conduct a preliminary study towards the automatic identification of multiword expressions based on gaze features from native and non-native speakers of English. We report comparisons between a part-ofspeech (POS) and frequency baseline to: i) a prediction model based solely on gaze data and ii) a combined model of gaze data, POS and frequency. In spite of the challenging nature of the task, best performance was achieved by the latter. Furthermore, we explore how the type of gaze data (from native versus non-native speakers) affects the prediction, showing that data from the two groups is discriminative to an equal degree. Finally, we show that late processing measures are more predictive than early ones, which is in line with previous research on idioms and other formulaic structures.Na

Wolverhampton Intellectual Repository and E-theses

Automatic question answering for medical MCQs: Can it go further than information retrieval?

Author: Ha Le An
Yaneva Viktoriya
Publication venue: RANLP
Publication date: 06/07/2019
Field of study

We present a novel approach to automatic question answering that does not depend on the performance of an information retrieval (IR) system and does not require training data. We evaluate the system performance on a challenging set of university-level medical science multiple-choice questions. Best performance is achieved when combining a neural approach with an IR approach, both of which work independently. Unlike previous approaches, the system achieves statistically significant improvement over the random guess baseline even for questions that are labeled as challenging based on the performance of baseline solvers

Crossref

Wolverhampton Intellectual Repository and E-theses

SYNTHESIS OF COPPER-BASED NANOPARTICLE CATALYSTS BY DIFFERENT METHODS FOR TOTAL OXIDATION OF VOC

Author: Le Toan Minh
Pham Huu Thien
Than Quoc An Ha
Publication venue: 'Publishing House for Science and Technology, Vietnam Academy of Science and Technology'
Publication date: 13/09/2018
Field of study

In this paper, the process of preparing 10 wt.% Cu/g-Al2O3 catalysts was studied by different methods. The changes in structure and texture of the catalysts were examined by X-ray diffraction (XRD), transmission electron microscopy (TEM) and Fourier-transform infrared spectroscopy (FT-IR). The activities of catalyst were investigated completely oxidized VOC (toluene and n-butanol) on gas-phase reactions over the Cu/g-Al2O3 catalyst. The results were found that influence of the size of copper nanoparticles enhancing copper dispersion and selectivity of the catalyst prepared by non-thermal plasma (NTP) was superior to those obtained from the impregnation (WI) and deposition-precipitation (DP). The total oxidation of VOC to CO2 and H2O was achieved above 275oC. Compared to the WI and DP, the NTP method increased the oxidation efficiency by 15-30%

Vietnam Academy of Science and Technology: Journals Online

Corpora for Computational Linguistics

Author: Evans Richard
Ha Le An
Hasler Laura
Mitkov Ruslan
Orăsan Constantin
Publication venue: 'Universidade Federal de Santa Catarina (UFSC)'
Publication date: 01/01/2007
Field of study

Since the mid 90s corpora has become very important for computational linguistics. This paper offers a survey of how they are currently used in different fields of the discipline, with particular emphasis on anaphora and coreference resolution, automatic summarisation and term extraction. Their influence on other fields is also briefly discussed

Directory of Open Access Journals

Wolverhampton Intellectual Repository and E-theses

Double RIS-Assisted MIMO Systems Over Spatially Correlated Rician Fading Channels and Finite Scatterers

Author: Choi Wan
Le Ha An
Nguyen Van Duc
Van Chien Trinh
Publication venue
Publication date: 08/09/2023
Field of study

This paper investigates double RIS-assisted MIMO communication systems over Rician fading channels with finite scatterers, spatial correlation, and the existence of a double-scattering link between the transceiver. First, the statistical information is driven in closed form for the aggregated channels, unveiling various influences of the system and environment on the average channel power gains. Next, we study two active and passive beamforming designs corresponding to two objectives. The first problem maximizes channel capacity by jointly optimizing the active precoding and combining matrices at the transceivers and passive beamforming at the double RISs subject to the transmitting power constraint. In order to tackle the inherently non-convex issue, we propose an efficient alternating optimization algorithm (AO) based on the alternating direction method of multipliers (ADMM). The second problem enhances communication reliability by jointly training the encoder and decoder at the transceivers and the phase shifters at the RISs. Each neural network representing a system entity in an end-to-end learning framework is proposed to minimize the symbol error rate of the detected symbols by controlling the transceiver and the RISs phase shifts. Numerical results verify our analysis and demonstrate the superior improvements of phase shift designs to boost system performance.Comment: 15 pages, 9 figures, accepted by IEEE Transactions on Communication

arXiv.org e-Print Archive